Skip to main content
The EDL Pipeline transforms raw API responses into a unified, enriched dataset through a series of orchestrated data transformations. This page traces how data flows from initial API calls through to the final compressed output.

The Central Hub: master_isin_map.json

Every data flow in the pipeline begins or depends on the master ISIN map created in Phase 1.
{
  "RELIANCE": {
    "ISIN": "INE002A01018",
    "Sid": "11915",
    "Symbol": "RELIANCE",
    "Name": "Reliance Industries Limited"
  },
  "TCS": {
    "ISIN": "INE467B01029",
    "Sid": "11536",
    "Symbol": "TCS",
    "Name": "Tata Consultancy Services Limited"
  }
  // ... 2,775 stocks total
}
Key Fields:
  • ISIN: International Securities Identification Number (used by ALL APIs)
  • Sid: Security ID (required for OHLCV and advanced indicators)
  • Symbol: Stock ticker (used for file naming and CSV matching)
  • Name: Company full name
Why It’s Critical: Every script in Phase 2+ iterates over this map to:
  1. Know which stocks to fetch data for
  2. Match API responses back to symbols
  3. Ensure consistent ISIN → Symbol mapping across all datasets

Phase-by-Phase Data Transformation

Phase 1: Core Data Foundation

1

1. Market Snapshot

Script: fetch_dhan_data.pyAPI Call:
POST https://ow-scanx-analytics.dhan.co/customscan/fetchdt
{
  "data": {
    "type": "full",
    "whichpage": "nse_total_market",
    "count": 5000
  }
}
Raw Output: dhan_data_response.json (~5 MB)
  • 2,775 stocks with current prices, technical indicators, volume
Derived Output: master_isin_map.json
  • Extracted ISIN, Sid, Symbol, Name for all stocks
2

2. Fundamental Data

Script: fetch_fundamental_data.pyAPI Calls: One per stock (2,775 requests)
POST https://open-web-scanx.dhan.co/scanx/fundamental
{"data": {"isin": "INE002A01018"}}
Output: fundamental_data.json (~35 MB)
  • Quarterly results (Net Profit, EPS, Sales, OPM)
  • Annual results (5 years history)
  • Balance sheet data
  • Shareholding patterns
  • Valuation ratios (ROE, ROCE, P/E)
3

3. Listing Dates

Source: NSE Archives CSV downloadOutput: nse_equity_list.csv
  • Symbol → Listing Date mapping
Data Available After Phase 1:
  • 2,775 ISINs mapped to symbols
  • Current market data (prices, volumes, RSI)
  • 5 years of quarterly fundamentals
  • Listing dates

Phase 2: Data Enrichment (Parallel Fetching)

All scripts in this phase run independently using master_isin_map.json. They can execute in any order (or in parallel).

fetch_company_filings.py

Strategy: Hybrid dual-endpoint fetchingAPI Calls: 2 per stock × 2,775 = 5,550 requestsDeduplication Logic:
  • By news_id + news_date + caption
  • Keeps most recent 100 filings per stock
Output Structure:
// company_filings/RELIANCE_filings.json
[
  {
    "news_id": "123456",
    "news_date": "2024-01-15",
    "caption": "Reg. 7(2) - Outcome of Board Meeting",
    "pdf_url": "https://..."
  }
]
Data Available After Phase 2:
  • 100 regulatory filings per stock
  • 50 news items per stock
  • Technical indicators (Pivots, SMA/EMA)
  • 2 years corporate action history + 2 months upcoming
  • Surveillance flags
  • Circuit breaker status
  • Bulk/block deals
  • Price band revisions

Phase 2.5: OHLCV Data (Incremental Download)

API Call (per stock):
POST https://openweb-ticks.dhan.co/getDataH
{
  "SYM": "RELIANCE",
  "SEC_ID": "11915",
  "INTERVAL": "D",
  "START": 215634600,  // Oct 31, 1976 (forces max history)
  "END": 1709481600    // Current timestamp
}
Smart Incremental Logic:
  1. Check if ohlcv_data/RELIANCE.csv exists
  2. If yes: Read last date, set START to last date + 1 day
  3. If no: Download from 1976 (full history)
Performance:
  • First-time: ~30 minutes (2,775 stocks × full history)
  • Incremental: ~2-5 minutes (only new dates)
Output CSV Format:
Time,Open,High,Low,Close,Volume
1609459200,2345.50,2367.80,2340.00,2360.75,12500000
1609545600,2365.00,2380.20,2350.10,2375.40,11800000
Data Available After Phase 2.5:
  • Daily OHLCV data for all stocks (from listing date to today)
  • ~2,775 CSV files in ohlcv_data/ directory

Phase 3: Base Analysis (Creating Master JSON)

Inputs:
  • fundamental_data.json → Financial metrics
  • dhan_data_response.json → Current prices, technical indicators
  • advanced_indicator_data.json → Pivots, SMA/EMA signals
  • nse_equity_list.csv → Listing dates
Processing Steps:Key Transformations:
  1. Quarterly Metrics Extraction:
    • Raw: "NET_PROFIT": "1250.5|1180.2|1090.8|1050.3|1100.1"
    • Extracted:
      • Net Profit Latest Quarter: 1250.5
      • Net Profit Previous Quarter: 1180.2
      • Net Profit Last Year Quarter: 1100.1
    • Calculated:
      • QoQ % Net Profit Latest: ((1250.5 - 1180.2) / 1180.2) × 100 = 5.96%
      • YoY % Net Profit Latest: ((1250.5 - 1100.1) / 1100.1) × 100 = 13.67%
  2. Valuation Ratios:
    • D/E Ratio: Non-Current Liabilities / Total Equity
    • PEG Ratio: P/E / YoY EPS Growth
    • Forward P/E: P/E × (TTM EPS / Annualized Latest EPS)
  3. Shareholding Changes:
    • Raw: "FII": "25.3|24.1"
    • Calculated: FII % change QoQ: 25.3 - 24.1 = 1.2%
    • Free Float: 100 - Promoter%
    • Float Shares: Total Shares × (Free Float / 100)
  4. Technical Indicator Parsing:
    • SMA Status: “SMA 20: Above (4.9%) | SMA 50: Above (24.1%)”
    • EMA Status: “EMA 20: Above (6.3%) | EMA 200: Above (72.6%)”
    • Technical Sentiment: “RSI: Neutral | MACD: Bearish”
  5. Index Membership:
    • Filters tech.idxlist for specific indices (Nifty 50, Bank Nifty, etc.)
    • Comma-separated list: “NIFTY 50, NIFTY BANK, NIFTY 100”
Output Structure (60+ fields per stock):
[
  {
    "Symbol": "RELIANCE",
    "Name": "Reliance Industries Limited",
    "Listing Date": "29-NOV-1977",
    "Basic Industry": "Refineries",
    "Sector": "Energy",
    "Market Cap(Cr.)": 1700000,
    "Latest Quarter": "Dec-2023",
    "Net Profit Latest Quarter": 18500,
    "QoQ % Net Profit Latest": 5.2,
    "YoY % Net Profit Latest": 12.3,
    // ... 50+ more fields
  }
]
Data Available After Phase 3:
  • Base JSON with 60+ fields for all 2,775 stocks
  • Identity, Fundamentals, Valuation, Ownership, Technical indicators
  • Ready for in-place enrichment in Phase 4

Phase 4: Enrichment Injection (Sequential Modifications)

Each script in this phase reads all_stocks_fundamental_analysis.json, modifies it in-place, and writes it back.
Critical: These scripts must run in exact order because later scripts depend on fields added by earlier ones.
1

1. Advanced Metrics (OHLCV-based)

Script: advanced_metrics_processor.pyReads: ohlcv_data/{SYMBOL}.csv for each stockCalculations:
# ATH (All-Time High)
ath = df['High'].max()
pct_from_ath = ((ath - latest_close) / ath) * 100

# ADR (Average Daily Range)
df['Daily_Range_Pct'] = ((df['High'] - df['Low']) / df['Low']) * 100
adr_5 = df['Daily_Range_Pct'].tail(5).mean()
adr_14 = df['Daily_Range_Pct'].tail(14).mean()
adr_20 = df['Daily_Range_Pct'].tail(20).mean()
adr_30 = df['Daily_Range_Pct'].tail(30).mean()

# RVOL (Relative Volume)
avg_vol_20 = df['Volume'].tail(21).iloc[:-1].mean()
rvol = latest_volume / avg_vol_20

# Gap Up %
gap_up = ((latest_open - prev_close) / prev_close) * 100

# Turnover (Rupee Volume)
df['Turnover_Cr'] = (df['Close'] * df['Volume']) / 10000000
turnover_20 = df['Turnover_Cr'].tail(20).mean()
Fields Added (15 fields):
  • ATH, % from ATH
  • 5/14/20/30 Days MA ADR(%)
  • RVOL
  • Gap Up %, Day Range %
  • % from 52W Low
  • 6 Month Returns(%)
  • 200 Days EMA Volume
  • % from 52W High 200 Days EMA Volume
  • Daily Rupee Turnover 20/50/100(Cr.)
  • 30 Days Average Rupee Volume(Cr.)
2

2. Earnings Performance

Script: process_earnings_performance.pyLogic:
  1. Read company_filings/{SYMBOL}_filings.json
  2. Find most recent “Quarterly Results” filing
  3. Extract date and closing price on that day from OHLCV
  4. Calculate returns from earnings day to current price
  5. Find max price since earnings to calculate peak returns
Pseudocode:
results_date = find_latest_quarterly_results_filing(symbol)
results_close = ohlcv_df.loc[results_date, 'Close']
current_price = ohlcv_df.iloc[-1]['Close']

returns = ((current_price - results_close) / results_close) * 100

max_price_since = ohlcv_df[results_date:]['High'].max()
max_returns = ((max_price_since - results_close) / results_close) * 100
Fields Added (3 fields):
  • Quarterly Results Date
  • Returns since Earnings(%)
  • Max Returns since Earnings(%)
3

3. F&O Data Enrichment

Script: enrich_fno_data.pyReads:
  • fno_lot_sizes_cleaned.json (lot size mapping)
  • fno_expiry_calendar.json (next expiry dates)
  • fno_stocks_response.json (F&O stock list)
Logic:
  • If symbol in F&O list → set FNO Flag: Yes
  • Look up lot size from mapping
  • Find next expiry date from calendar
Fields Added (3 fields):
  • FNO Flag (Yes/No)
  • Lot Size
  • Next Expiry (date)
4

4. Market Breadth & Relative Strength

Script: process_market_breadth.pyCalculation:
  • Uses return data already in base JSON
  • Computes relative strength rating (1-100)
  • Generates market breadth statistics
Fields Added:
  • Relative Strength Rating
  • Market breadth percentile
5

5. Historical Market Breadth

Script: process_historical_market_breadth.pyOutput: Separate time-series file for charting (not added to master JSON)
6

6. Corporate Events & News (FINAL)

Script: add_corporate_events.pyAggregation Strategy:Event Markers Logic:
# Surveillance
if symbol in asm_list and "LTASM" in stage:
    events.append("★: LTASM")

# Upcoming Corporate Actions (within 30 days)
for action in upcoming_actions:
    if action['Symbol'] == symbol:
        if "DIVIDEND" in action['Type']:
            events.append(f"💸: Dividend ({action['Date']})")
        elif "BONUS" in action['Type']:
            events.append(f"🎁: Bonus ({action['Date']})")
        elif "RESULTS" in action['Type'] and within_14_days:
            events.append(f"⏰: Results ({action['Date']})")

# Block Deals (last 7 days)
if symbol in recent_deals:
    events.append("📦: Block Deal")

# Price Band Revision
if symbol in circuit_revisions:
    events.append("#: +/- Revision")
Recent Announcements (Top 5):
filings = load_filings(f"company_filings/{symbol}_filings.json")
top_5 = sorted(filings, key=lambda x: x['news_date'], reverse=True)[:5]

announcements = [
    {
        "Date": filing['news_date'],
        "Headline": filing['caption'],
        "URL": filing['pdf_url']
    }
    for filing in top_5
]
News Feed (Top 5):
news = load_news(f"market_news/{symbol}_news.json")
top_5 = sorted(news, key=lambda x: x['timestamp'], reverse=True)[:5]

news_feed = [
    {
        "Title": item['headline'],
        "Sentiment": item['sentiment'],  # positive/negative/neutral
        "Date": item['date']
    }
    for item in top_5
]
Fields Added (3 compound fields):
  • Event Markers (array of icons/labels)
  • Recent Announcements (array of 5 objects)
  • News Feed (array of 5 objects)
Data Available After Phase 4:
  • Complete JSON with all 86 fields for all 2,775 stocks
  • Ready for compression

Phase 5: Compression

Simple gzip compression of the final JSON:
import gzip

with open("all_stocks_fundamental_analysis.json", "rb") as f_in:
    data = f_in.read()
    
with gzip.open("all_stocks_fundamental_analysis.json.gz", "wb", compresslevel=9) as f_out:
    f_out.write(data)
Compression Results:
  • Raw JSON: ~38 MB
  • Compressed: ~7.5 MB
  • Ratio: 80% reduction

Final Output Structure

[
  {
    // ─── Identity (6 fields) ───
    "Symbol": "RELIANCE",
    "Name": "Reliance Industries Limited",
    "Listing Date": "29-NOV-1977",
    "Basic Industry": "Refineries",
    "Sector": "Energy",
    "Index": "NIFTY 50, NIFTY 100, NIFTY ENERGY",
    
    // ─── Valuation (7 fields) ───
    "Market Cap(Cr.)": 1700000,
    "Stock Price(₹)": 2540.75,
    "P/E": 28.5,
    "Forward P/E": 26.2,
    "Historical P/E 5": 0.0,
    "PEG": 2.31,
    "% from 52W High": -8.5,
    
    // ─── Fundamentals - Quarterly (32 fields) ───
    "Latest Quarter": "Dec-2023",
    "Net Profit Latest Quarter": 18500,
    "Net Profit Previous Quarter": 17600,
    "QoQ % Net Profit Latest": 5.11,
    "YoY % Net Profit Latest": 12.3,
    // ... (EPS, Sales, OPM with Latest/Previous/2Q/3Q/LastYr)
    
    // ─── Fundamentals - Ratios (5 fields) ───
    "ROE(%)": 15.2,
    "ROCE(%)": 12.8,
    "D/E": 0.45,
    "OPM TTM(%)": 11.5,
    "Sales Growth 5 Years(%)": 8.7,
    
    // ─── Ownership (4 fields) ───
    "FII % change QoQ": 1.2,
    "DII % change QoQ": -0.5,
    "Free Float(%)": 49.5,
    "Float Shares(Cr.)": 336.5,
    
    // ─── Price Performance (6 fields) ───
    "1 Day Returns(%)": 0.8,
    "1 Week Returns(%)": 2.5,
    "1 Month Returns(%)": 4.2,
    "3 Month Returns(%)": 8.5,
    "6 Month Returns(%)": 15.3,
    "1 Year Returns(%)": 28.7,
    
    // ─── Technical Indicators (6 fields) ───
    "RSI (14)": 62.5,
    "SMA Status": "SMA 20: Above (4.9%) | SMA 50: Above (24.1%)",
    "EMA Status": "EMA 20: Above (6.3%) | EMA 200: Above (72.6%)",
    "Technical Sentiment": "RSI: Neutral | MACD: Bearish",
    "Pivot Point": "2485.50",
    "Gap Up %": 0.3,
    
    // ─── Volatility & Volume (11 fields) ───
    "5 Days MA ADR(%)": 2.1,
    "14 Days MA ADR(%)": 2.3,
    "20 Days MA ADR(%)": 2.4,
    "30 Days MA ADR(%)": 2.5,
    "Day Range(%)": 1.8,
    "RVOL": 1.25,
    "200 Days EMA Volume": 8500000,
    "% from 52W High 200 Days EMA Volume": -15.2,
    "Daily Rupee Turnover 20(Cr.)": 1250,
    "Daily Rupee Turnover 50(Cr.)": 1180,
    "Daily Rupee Turnover 100(Cr.)": 1150,
    "30 Days Average Rupee Volume(Cr.)": 1200,
    
    // ─── Historical Metrics (3 fields) ───
    "ATH": 2975.50,
    "% from ATH": -14.6,
    "% from 52W Low": 35.8,
    
    // ─── Earnings (3 fields) ───
    "Quarterly Results Date": "15-Jan-2024",
    "Returns since Earnings(%)": 3.5,
    "Max Returns since Earnings(%)": 8.2,
    
    // ─── F&O Data (3 fields) ───
    "FNO Flag": "Yes",
    "Lot Size": 250,
    "Next Expiry": "28-Mar-2024",
    
    // ─── Circuit Info (1 field) ───
    "Circuit Limit": "20%",
    
    // ─── Event Markers (1 array field) ───
    "Event Markers": [
      "📊: Results Recently Out",
      "💸: Dividend (15-Mar)"
    ],
    
    // ─── Recent Announcements (1 array field) ───
    "Recent Announcements": [
      {
        "Date": "2024-01-15",
        "Headline": "Outcome of Board Meeting - Quarterly Results",
        "URL": "https://www.bseindia.com/..."
      },
      // ... 4 more
    ],
    
    // ─── News Feed (1 array field) ───
    "News Feed": [
      {
        "Title": "Reliance announces new green energy initiative",
        "Sentiment": "positive",
        "Date": "2024-03-02"
      },
      // ... 4 more
    ]
  }
  // ... 2,774 more stocks
]
Total: 86 fields per stock × 2,775 stocks = 238,650 data points

Data Lineage Summary

Next Steps

Output Schema

Detailed breakdown of all 86 fields

Pipeline Architecture

Understand the 6-phase design

API Endpoints

Complete Dhan API endpoint reference

Pipeline Settings

Configure pipeline behavior and flags

OHLCV Configuration

Optimize OHLCV download strategy